-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow searching for similar images #21
feat: allow searching for similar images #21
Conversation
Hi, @ramayer. Thank you for opening this.
I'm cool with sticking to the one Wikimedia recommends for now and returning to this question when/if something will break.
Good. Thank you for letting me know. I suggest also checking that regular queries perform exactly as before.
It would've been interesting to see the performance difference. If you'll be benchmarking it, please, include the call to
I like this idea. We may also process all of the pictures (if multiple were passed) in a single batch. |
Thx for the feedback. I agree with everything; but probably won't get around to it until the weekend. |
Sry for the delay - I got distracted (actually on different aspects of rclip --- I was attempting to fine-tune CLIP to recognize my particular dog instead of all dogs in general - but didn't get too far) . Now estimating early this coming weekend. |
@ramayer, no worries. Fine-tuning sounds fun. Thank you :) |
I pushed an updated version with much of the above feedback.
Right now it's still in a number of separate commits; I'll squash them once it's getting close. |
rclip/model.py
Outdated
def image_from_file(self, query: str) -> Image.Image: | ||
path = query.removeprefix('file://') | ||
img = Image.open(path) | ||
return img |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think str.removeprefix
might be a python 3.9 dependency; and the Pipfile
calls for 3.8. I could replace it with a re.sub() expression if 3.8 support's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramayer, sounds good. Thank you!
@ramayer, thank you very much for the update. I indicated a few areas for improvement in the comment. |
Co-authored-by: Yurij Mikhalevich <0@39.yt>
@@ -8,6 +8,7 @@ clip = {git = "https://github.com/openai/CLIP.git"} | |||
torch = "==1.9.0+cpu" | |||
torchvision = "==0.10.0+cpu" | |||
pillow = ">=8.3.2" | |||
requests = "~=2.26" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add it via pipenv install "requests~=2.26"
instead of editing the Pipfile
?
This will also update Pipfile.lock
and will make the tests pass. I can do it if you don't want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Struggling to get this to resolve without errors.
PS -- wondering what direction you want to take this project in the future. On my home machines it's now the main way I've been organizing my pictures; and I've hacked on a lot of features that might-or-might-not be out of scope for your vision for this project - including:
- De-dup --- Embeddings are cool - they can even find duplicate thumbnails of different sizes. I added a script that finds pictures with identical (or near identical) embeddings, and flag them as a duplicate in a column I added to SQLite.
- Support for explicit captions/tags/picking up words from folders --- for some images I already had them tagged with words that CLIP couldn't figure out. For example, folders of country names I've visited. I added a column for the CLIP embedding of the words I've manually tagged; and optionally also search that column. Without it, a search for 'germany' only returns about 1/10 of my pictures from germany.
- Face embeddings -- so I can search for a specific person. I'm using https://github.com/serengil/deepface to add a column for faces. Currently I can't input the name of a face - but using the "similar to an existing image" I can find them. Combining this with the "explicit tagging" feature above I'll soon be able to search for people by name (as long as I have tagged them at least once)
No idea if you consider those in or our of scope for the main project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramayer, thanks for the update. I'll take a look at it.
About the direction:
- I would love to keep the rclip as a small and simple CLI tool, but considering building a more advanced app on top of it.
- I like the de-dup idea and think that it can be a part of the rclip CLI tool. But, I am not 100% sure about this one yet, so instead of filing a GHI, I created a Discussion here: Should we and how to add a de-dup functionality to rclip? #23.
- This is a must-have feature for a practical photo search tool, but implementing it will fundamentally change how
rclip
works. Also, if we will implement this inrclip
it will start output pictures that don't feature anything German in them (if though they were taken in Germany; for example, a picture of a generic bed made in the hotel). I feel like this feature can be a part of another tool that will incorporaterclip
. Or I should change how I think aboutrclip
😅 Let me take to some time to consider this. I also think that we may consider extracting manual/existing tags from the images themselves. - Face embeddings feature sounds great, and my general attitude to it is more open compared to the one I have towards indexing the folder names.
Thank you for sharing all these. All good ideas that I need to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramayer, pipenv install "requests~=2.26"
worked fine for me. But, you shouldn't forget to execute pipenv shell
before doing anything with pipenv
. I mention it here, in the Contributing section: https://github.com/yurijmikhalevich/rclip#contributing. Executing pipenv shell
sets env variable from .env
which is required to install the PyTorch with pipenv
from the official PyTorch "repository."
Thank you very much for the update. I've added a few comments. Once they are addressed, we should be good to merge. |
Co-authored-by: Yurij Mikhalevich <0@39.yt>
Updated and merged. Thank you! |
@all-contributors please add @ramayer for code |
I've put up a pull request to add @ramayer! 🎉 |
Still a work in progress for #20 . Not yet ready for merging; but creating the pull request early for comments on the patches.
A couple open issues/questions:
What user agent string (if any) should rclip use when it makes http(s) requests? I think it should have a user agent, because some websites like Wikimedia Commons are pretty picky and block requests without one ( https://meta.wikimedia.org/wiki/User-Agent_policy ). I just put in a temporary placeholder for now, mostly so wikimedia.org doesn't complain when I test with things like:
./bin/rclip.sh -f https://upload.wikimedia.org/wikipedia/commons/c/c1/Variegated_golden_frog_%28Mantella_baroni%29_Ranomafana.jpg
So far this has only been minimally tested. I've been using this:
./bin/rclip.sh ./tmp/d2/img0015.jpg
looking for a score of 100% from the identical picture, and this
./bin/rclip.sh -t 999 "" --sub "" --sub ./tmp/d2/img0015.jpg | tail
looking at a score of -100% to make sure that adding and subtracting is working; but I haven't really tried many combinations yet. I haven't tried on Windows at all yet.
This commit undid the optimization from the previous pull request of calculating many text vectors in a single statement. Not sure it's important. If so, I suppose it could use
itertools.groupby
to group the user's inputs into text vs image requests; and at least process all the text ones in a single group.